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ABSTRACT 


Rationale  is  presented  for  the  development  of  more  effective 
measures  of  pattern  association  that  may  "be  determined  by  direct 
evaluation  of  pattern  similarities.   A  general  notation  is  suggested  for 
mathematical  representation  of  patterns  as  multidimensional  probability 
distributions.   With  respect  to  this  notation,  measures  of  pattern 
distance,  pattern  dissimilarity,  and  pattern  correlation  are  developed 
that  are  expressible  directly  in  terms  of  initial  pattern  quantizations. 
The  measure  of  pattern  correlation  given  may  be  computed  invariant  with 
respect  to  individual  pattern  sizes,,  positions,  and  proximate  orienta- 
tions.  The  concepts  employed  would  seem  well  suited  for  both  geometric 
and  network  models  of  pattern  information  processing. 


1.   Introduction 

The  deficiencies  of  established  correlation  techniques  for 
effective  quantification  of  pattern  similarities  have  discouraged  greatly 
to  date  the  development  of  methodologies  of  pattern  recognition  based  on 
methods  of  direct  comparison  [1,2,31*   Clearly,  where  patterns  of  the  same 
class  may  differ  in  size,  position,  orientation,  and  degree  and  nature 
of  distortion,  conventional  template-matching  procedures  are  inappropriate. 
Thus,  with  exceptions  (see,  for  example,  Widrow  [k,  5]),  an  apparent 
majority  of  researchers  have  chosen  to  pursue  analytic  methodologies  of 
pattern  recognition,  i.e.,  methodologies  in  which  pattern  classification 
depends  either  upon  analysis  of  transformation  and  deformation  invariant 
pattern  properties,  attributes,  or  features  (e.g.,  statistical  methods) 
or  upon  analysis  of  invariant  structural  relationships  between  pattern 
components  (e.g.,  syntactic  methods). 

There  remains,  however,  in  philosophical  opposition  to  all 
analytic  methodologies  of  pattern  recognition  the  basic  hypothesis  of 
gestalt--that  there  exist,  as  the  most  elementary  units  of  perception, 
holistic  organizations  of  phenomena,  unitary  perceptual  entities,  or 
wholes  whose  phenomenological  characters  defy  analytic  description  and 
are  only  apprehensible  directly.   Under  this  assumption,  patterns  them- 
selves are  necessarily  their  only  valid  characterizations.   To  the  extent 
then  that  in  a  particular  context  meaningful  categories  of  patterns 
derive  directly  from  basic  similarities  of  gestalt,  we  must  consider  all 
analytic  methodologies  of  pattern  recognition  inappropriate  to  the  task 
at  hand. 

Adopting  philosophically  the  premise  that  patterns  are  their 
own  most  valid  characterizations,  while  acknowledging  the  inadequacies 
of  conventional  correlation  methods  of  pattern  similarity  measurement, 
we  consider  a  fundamental  problem  of  pattern  information  processing 
research  to  be  the  development  of  more  general  and  more  effective 
measures  of  pattern  association  that  may  be  expressed  and  computed 
directly  in  terms  of  initial  pattern  quantizations. 
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Relying  greatly  on  mathematical  concepts  long  employed  by 
social  scientists  for  modeling  economic  and  social  interaction  patterns 
within  urban  and  regional  environments,  below  we  suggest  a  general 
representation  of  quantized  patterns  as  probabilistic  spatial  distribu- 
tions of  information  and,  with  respect  to  a  particular  mathematical 
notation,  develop  numerical  indices  of  pattern  distance,  pattern 
dissimilarity,  and  pattern  correlation  that  are  expressible  directly  for 
any  two  patterns  so  defined.   These  measures  of  pattern  association  are 
first  developed  geometrically  for  planar  pictorial  patterns  as 
coefficients  of  spatial  congruence  between  pairs  of  two-dimensional 
probability  distributions.   The  indices  presented,  however,  appear 
applicable  as  congruence  measures  for  multidimensional  probability 
distributions  in  general.   In  particular,  the  index  of  pattern  correla- 
tion presented  is  invariant  with  respect  to  individual  pattern  sizes, 
positions,  and  proximate  orientations  and  continuous  with  respect  to 
individual  pattern  deformations.   The  concepts  employed  point  toward 
general  network  models  of  pattern  information  processing  that  permit 
conceptualization  of  both  patterns  and  associations  between  patterns  as 
probabilistic  network  distributions  of  pattern- specific  information 
quanta. 

2.   Patterns  and  Pattern  Distance 

While  generalities  surrounding  the  concept  pattern  make 
difficult  any  single  definition,  to  assist  the  present  mathematical 
discussion  we  offer  the  following:   a  pattern  is  a  unitary  organized  set 
of  quantized  information  whose  probabilistic  spatial  (and/or  temporal) 
distribution  over  some  set  of  sampling  elements  characterizes  some  more 
complex  phenomenon  source . 

If  we  adopt  at  least  provisionally  the  above  definition,  we 
may  represent  mathematically  any  particular  pattern  f  as  a  partitioned 
array  (w|x)„,  as  tall  as  there  are  sampling  elements  of  f,  where  W  is 
a  matrix  of  coordinates  indicating  the  relative  spatial  positions  of 
all  elements  of  f,  and  X  is  a  vector  of  positive  reals  indicating  the 
proportional  distribution  of  quantized  units  of  information  across  all 
pattern  elements.   For  lack  of  any  existing  term,  we  will  refer  generally 
to  these  quantum  units  of  pattern  information—whose  distribution 
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characterizes  a  particular  pattern  source- -simply  as  pattern  quits 
(quantization  units).   Also,  for  mathematical  convenience  we  will  assume 
normalization  of  pattern  intensities,  i.e.,  equalization  of  recorded 
quit  totals  (or  in  the  case  of  pictorial  patterns  normalization  of 
overall  levels  of  brightness  or  darkness),  so  that  E.  x.  =  1.   Thus 
the  representation  of  a  particular  pattern  f  given  by  (w|x),,  may  be 
considered  a  discrete  probability  distribution  X  of  pattern  quits  over 
a  spatially  arrayed  set  of  sampling  elements  with  centroids  W.   (See 
Figure  1. ) 
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Figure  1.   Example  pattern  quantization  and  mathematical  representation. 


With  such  mathematical  notation  we  consider  the  following 
pattern  association  measurement  problem:  given  a  set  of  patterns  F, 
determine  a  symmetric  non-negative  scalar  index  of  pattern  distance, 


Dp   =  d[(w|x)  ,  (Y|z)  ],  pairwise  computable  for  all  f  €  F  and  g  e  F, 
■*-  >  g         -t       g 

such  that  D    approaches  zero  as  the  spatial  congruence  of  the 
probability  distributions  of  f  and  g  increases. 
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For  any  two  patterns  f  and  g  quantized  in  terms  of  the 
notation  given  above  we  may  establish  such  an  index  of  pattern  distance 
in  the  following  manner.   We  may  determine  a  weighted  correspondence  of 
elements  between  f  and  g  such  that  there  exists  maximal  proximity 
between  elements  corresponding  between  f  and  g.   We  may  then  take  as 
our  measure  of  pattern  distance  the  weighted  sum  of  squared  distances 
between  all  pairs  of  elements  corresponding  between  f  and  g  where  the 
weights  of  the  sum  reflect  the  degree  of  correspondence  between  each 
element  pair. 

Let  the  two  patterns  f  and  g  consist  of  m  and  n  elements 
respectively  and  let  their  quantized  representations  be  denoted  (w|x) 
and  (Y|Z)  .   We  represent  a  particular  weighted  correspondence  of 
elements  between  f  and  g  as  a  matrix  Q  (m  x  n)  satisfying 


m 

(1)  Z  q    =  z      j  =  1,  .  ..,  n 

n 

(2)  Z  q,  .  =  x.     i  =  1,  . ..,  m 


i   i,  J    i 


J 


(3)   q    >  0        i  =  1,  .  ..,  m 


j  =  1,  •  •  • ,  n . 


Let  jt_   denote  the  set  of  all  Q  matrices  satisfying  (1),  (2),  and  (3) 

f  9   g 

for  given  X  of  f  and  Z  of  g.   Now  by  normalization  of  pattern  intensities 

Z.x.  =  Z.z   =  1,  hence  Z.Z.q,  .  =  1.   Since  also  q.  .  >  0  for  all  i  and 
ii    J  J  i  J  k,  j  i,  j  - 

j,  we  may  consider  any  Q  f  it    to  be  a  discrete  joint  probability 
distribution  of  "quit  correspondences"  between  the  elements  of  f  and  the 
elements  of  g.   Alternatively,  any  Q  e  it    represents  a  probabilistic 
matching  or  connection  of  the  quits  of  f  with  the  quits  of  g. 

Now  assuming  fixed  geometries  for  f  and  g  (for  example,  a  font 
recognition  problem  where  all  quantized  patterns  may  be  assumed  standar- 
dized with  respect  to  positions,  sizes,  and  orientations),  let  S  be  the 
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matrix  of  squared  distances  between  all  elements  of  f  and  all  elements 
of  g  given  directly  by 

W    =i,j=2(w1)k-yJ)k)2    1  =  1,  ....  - 

j  =  1,  ...,  n.  . 

Our  formulation  of  pattern  distance  between  f  and  g  is  then 


m  n 


(5)  D    =  min   E  E  q.  .  s.  .  =  min   tr  (Q*S) 

f'g    <*«,«13  1,J  l'J    ^*t « 


where  again  Jt^   is  the  set  of  all  Q  matrices  satisfying  (1),  (2),  and 

f }  g 
(3)«   Note  that  such  a  measure  of  pattern  distance  may  be  interpreted 

as  a  minimal  mean  squared  distance  of  spatial  separation  between 

corresponding  quits  of  f  and  g. 

Now  the  optimization  problem  given  by  (l),  (2),  (3),  and  (5), 
where  Z.x.  =  Z.z.  =7  but  7  not  necessarily  unity,  may  be  recognized  as 
the  Hitchcock  or  transportation  problem  of  linear  programming  [6,7,8] • 
Typically,  the  problem  requires  determination  of  a  matching  between  a 
spatially  distributed  set  of  economic  supplies  and  a  spatially  distributed 
set  of  demands  such  that  the  total  cost  of  all  material  movements  from 
suppliers  to  buyers  is  minimal.   For  such  problems,  computational 
algorithms  are  well  known  and  solution  properties  well  documented  [9;  10]. 
Thus,  a  variety  of  computational  procedures  exist  that  can  be  employed 
to  determine  simultaneously  an  optimal  set  of  weighted  correspondences 
between  pattern  elements  (an  extremal  joint  probability  distribution  of 
quit  correspondences)  and  the  minimal  value  of  pattern  distance  yielded 
by  these  correspondences. 

We  may  note  also  at  this  point  that  the  measure  of  pattern 
distance  presented  should  be  useful  not  only  for  pattern  recognition 
applications  per  se  but  also  for  numerous  other  applications  where 
there  is  needed  some  composite  scalar  measure  of  the  spatial  congruence 
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of  pairs  of  probability  distributions.  For  example,  the  measure 
presented  would  seem  well  suited  as  a  measure  of  ecological  association 
between  spatially  distributed  populations  of  social  and  biotic  communities 
within  ecosystem  analyses  [11,12]. 

3.   Pattern  Dissimilarity 

The  index  of  pattern  distance  presented  above  provides  a 

measure  of  the  spatial  congruence  of  patterns  under  the  assumption  that 

individual  pattern  positions,  sizes,  and  orientations  may  be  regarded  as 

standardized  or,  for  other  reasons,  must  be  taken  as  fixed.   For  most 

pattern  recognition  applications,  however,  no  such  conditions  will  prevail. 

Hence,  we  remain  faced  with  the  problem:   given  a  set  of  patterns  F  whose 

overall  characters  may  be  considered  independent  of  individual  pattern 

positions,  sizes,  and  orientations,  determine  a  symmetric  non-negative 

index  of  pattern  dissimilarity,  a    =  5[(w|x)  ,  (y|z)  ],  pairwise 

1,  g         1       g 

computable  for  all  f  e  F  and  g  e  F,  such  that  A^   approaches  zero  as  the 
similarity  of  f  and  g  increases. 

As  an  extension  of  the  method  presented  above  for  measurement 
of  in  situ  pattern  congruence,  we  establish  an  index  of  pattern  dissimi- 
larity in  the  following  manner.   We  determine  not  only  a  weighted 
correspondence  of  elements  between  f  and  g  but  also  a  spatial  registration 
of  f  with  respect  to  g  such  that  there  results  maximal  spatial  congruence 
of  elements  corresponding  between  f  and  g.   We  then  take  as  our  criterion 
of  pattern  dissimilarity  the  weighted  sum  of  resulting  squared  distances 
between  all  pairs  of  elements  corresponding  between  f  and  g  where  the 
weights  of  the  sum  again  reflect  the  extent  of  correspondence  determined 
for  each  pair  of  pattern  elements. 

Let  a  particular  spatial  registration  of  f  with  respect  to  g 
be  denoted  oWR  +  JT'  where  J  is  the  vector  (1,  ...,  1  ) ',  T  is  a 
translation  vector,  a  is  a  scale  factor,  and  R  is  any  additional  legitimate 
linear  transformation,  e.g.  a  proper  rotation.   For  a  given  registration 
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of  f  with  respect  to  g,  let  S  (m  x  n)  be  the  resulting  matrix  of  squared 
distances  between  the  elements  of  f  and  the  elements  of  g. 

(6)  s.  .  =  Z  [ct(Z  w.  ,  r    -  t  )  -  y   ]      i  =  1,  . . .,  m 

0=1,  •  •  • ,  n 

Let  Z    denote  the  set  of  S  matrices  obtainable  for  f  and  g  by  (6)  over 

f ,  g 
all  positive  scalar s  q,  all  translations  T,  and  all  legitimate 

transformations  R. 

Our  criterion  of  pattern  dissimilarity  may  then  be  formulated: 

m  n 

(7)  Ap    =       min        £  ^  1i  n  S-i  i 

f'e     ^s^,g  lj  1,J  l'J 

min       tr  (Q'S). 
^«f,B>    SeEf,g 


Note  that  such  an  index  of  pattern  dissimilarity  may  be  interpreted  theo- 
retically as  a  minimal  mean  squared  error  of  registration  between  all 
corresponding  quits  of  f  and  g. 

In  the  following  sections  we  present  a  number  of  alternative 
mathematical  techniques  which,  in  specific  combinations,  provide  computa- 
tional solutions  to  (7).   The  methods  developed  all  yield  numerical 
estimates  for  Ap   via  iterative  solution  of  the  two  interdependent 
subproblems  implied  by  (7)  --  the  correspondences  problem  requiring 

minimization  of  A„   over  all  Q,  c   jt „   for  fixed  S,  and  the  transformation 
f ,  g  .  f ,  g  

problem  which  requires  determination  of  that  spatial  registration  of  f 

with  respect  to  g  that  minimizes  A„   over  all  S  c  Z  _   for  fixed  Q. 

f >  g  f  y  g 

Since,  as  pointed  out  above,  we  already  have  at  hand  established  linear 
optimization  techniques  for  computational  solution  of  the  correspondences 
problem,  let  us  now  turn  to  analysis  of  the  specific  transformations 
required  for  optimal  spatial  registration  of  patterns  within  pairwise 
comparisons. 
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k.      The  Transformation  Problem  and  Normalization  Procedures 

Let  f  and  g  "be  two  patterns,  again  with  quantizations  (w|x)_p 
and  (y|z)  ,  taken  from  a  set  of  patterns  F  not  assumed  to  be  standardized 
with  respect  to  individual  pattern  sizes,  positions,  and  orientations. 
On  the  contrary,  in  this  section  let  us  assume  that  pattern  positions 
and  sizes  are  arbitrary  and  that  also  individual  pattern. orientations 
may  include  considerable  rotational  displacements  from  prototypical  axial 
alignments.   Our  problem  is  to  determine  that  set  of  translational,  scale, 
and  rotational  transformations  that  will  bring  about  that  particular 
spatial  registration  of  f  and  g,  and  hence  that  particular  matrix  of 
squared  distances  S  via  (6),  such  that  our  previous  measure  of  pattern 
distance,  D    via  (*?),    might  be  determined  as  a  minimum  over  all 
S  r   Z„   as  well  as  over  all  Q  e  n^,   .   This  is  precisely  the  meaning  of 
our  measure  of  pattern  dissimilarity  A    as  given  via  (7). 

Now  regarding  Q  as  given,  consider  all  possible  translations  of 
f  with  respect  to  g  and  with  reference  to  (5)  write 

m  n  p 

(8)   D  =  ZZq    Z[(w    -  t  )  -  y   f. 
i  i   ^  ^  k:     f  ^' 

To  determine  the  particular  T  that  minimizes  D  over  all  translations  of  f 
with  respect  to  g,  differentiate  (8)  with  respect  to  t,  to  obtain 

from  which  it  may  be  determined  that  t  =  0  where  Z.Z.  q.  .  w.    = 

K  1  J   1,  J   1,  K 

Z.Z.  q.  .  y.  .,  or  where  Z.  x.  w.  .  =  Z.  z.  y.  n  .   This  condition 
1  J  i>J   J,k  1   1  i,k    j   j  °j,k 

implies  that  an  optimal  registration  between  f  and  g  requires  a  coinci- 
dence of  pattern  centroids.   Let  us  therefore  normalize  the  positions  of 
both  f  and  g  so  that  centroids  are  coincident  at  a  common  origin,  i.e., 

so  that  Z.  x.  w.  .  =  Z.  z.  y.  ,  =0  for  each  spatial  dimension  k.   Since 
ill,  k    3      3      3,  k 

these  new  centroids  remain  invariant  over  any  additional  scale  and 
rotational  transformations,  we  may  conclude  that  no  further  consideration 
of  pattern  translations  is  necessary  in  minimizing  Ap   . 
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Now  consider  all  possible  positive   scale  factors   a  applied 
to  f   so  that 

m  n  „ 

(10)  D  =  Z  Z  q         Z   (a  w         -   y        )    . 

ij     ljJk  x>k         J'k 

Differentiating  D  with  respect  to  a   we  find  that  the  particular  o 
minimizing  (10 )  is  given  by 

m  n  m  n 

(11)  a  =   [  Z  Z  q    Z  w.   y   ]  /  [  Z  Z  q.  Z  w^  ]. 

If  a  scales  optimally  W  with  respect  to  Y,  then  by  symmetry,  a       scales 

optimally  Y  with  respect  to  W.   By  an  identical  analysis  we  may  determine 

-1  j., 
for  o       the  expression 

-,     m  n        _       m  n 

(12)  a"1  =  [  Z  Z  q    Z  /      ]  /  [  Z  Z  q    Z  w    y   ]. 

i  j   1^  J  k  J>       i  i        > -«J  k   '    ^ 

Hence  a  =  a   =1  where  Zj_Z^  q.  .  Z^  w^  ^  =  Zj_Z^  q.  .  Z^  y^  ^,  or  where 
Z^  Xj_  Z^  w^  i^  =  Z-j  z^  Z^  j±   j^.   This  condition  implies  that  optimal 
registration  of  f  and  g  requires  equality  of  pattern  second  moments  about 
the  origin.   Let  us  therefore  normalize  the  sizes  of  both  f  and  g  so  that 
second  moments  equal  unity,  i.e.,  so  that  Z.  x.  Z,  w^  ^  =  Z.  z  .  Z,  y-  i  =  1. 
Since  these  second  moments  remain  unchanged  over  any  rotational  transfor- 
mations that  may  be  required,  in  minimizing  A„   we  may  now  also  exclude 

i?  S 

all  further  consideration  of  pattern  sizes. 

The  above  analysis  demonstrates  that  transformed  pattern 
positions  and  sizes,  optimal  with  respect  to  the  minimal  A~  >    may  be 
determined  directly  by  normalization  procedures  independent  of  whatever 
correspondences  Q  are  defined  between  the  elements  of  f  and  g  and 
independent  of  whatever  rotation  R  may  be  chosen  to  effect  maximum 
spatial  congruence  of  corresponding  elements.   The  specific  normalization 
procedures  given  might  be  profitably  included  as  a  pre-processing  step 
with  the  generalized  template -matching  technique  given  above  in  Section 
2.  providing  normalized  measures  of  pattern  distance  for  all  pairwise 
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ci  ompari  sons.   In  the  present  context,  it  remains  to  determine  that 
particular  R  and  that  particular  Q  (which  as  we  shall  see  are  inter- 
dependent) that  together  yield  a  minimum  value  of  pattern  dissimilarity 

5.   Rotation  to  Maximum  Pattern  Correlation 

To  determine  simultaneously  the  rotation  R  effecting  maximal 
spatial  congruence  of  normalized  patterns  f  and  g  and  the  matrix  of 
correspondences  Q  that  together  yield  a  minimal  value  of  pattern 
dissimilarity  Ap  >    we  adopt  a  hill-climbing  computational  procedure. 
With  respect  to  initial  orientations  of  f  and  g,  we  compute  a  first 
estimate  of  S  via  (h)   and  then  a  first  estimate  of  Q  via  (5).   Then, 
with  these  initial  correspondences  fixed,  we  may  determine  a  first 
estimate  of  the  optimal  rotation  R  in  the  following  manner. 

Note  that,  for  f  and  g  normalized  and  Q,  fixed,  our  stepwise 
optimal  value  of  pattern  dissimilarity  may  be  formulated 

(13)  A  =  min   tr  (Q'S) 

where  our  problem  is  again  to  determine  a  rotation  R  e  9  ■(©  the  set  of 
proper  planar  rotations)  yielding  a  new  estimate  of  S  via  (6)  stepwise 
optimal  with  respect  to  Q. 

Define  Q  and  S  as  rim  x  mn  diagonal  matrices  such  that 

and 

(15)  (s.  ,,  s    ...,  s     )  =  (s..  ..,  sn    ...,  s   ). 

v  ' '      K— 1, 1  —2,  2      — mn,  mn    v  1, 1   1,  2    '   m,  n ' 

Then  we  may  express  (13)  alternatively  as 

(16)  A  =  min  tr  (Q'S)  =  rain  tr  (QS) 

Re0  R€0 

where  the  elements  of  S  remain  to  be  determined  as  a  function  of  the 
unknown  R. 
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Also  define  W  to  "be  mn  x  k  where  the  first  row  of  W  is  repeated 
n  times  as  the  first  n  rows  of  W,  the  second  row  of  W  repeated  as  the 
next  n  rows  of  W  and  so  forth.   Define  Y  to  be  mn  x  k  where  the  entire 
matrix  Y  is  simply  repeated  vertically  m  times. 

Now  note  that  ordered  diagonal  elements  of  the  matrix 
[ (WR- Y) (WR- Y) ' ]  are  identically  equal  to  the  elements  of  S,  hence  we  may 
write 

(17)  tr  [(WR-Y)  (WR-Y)']  =  tr  (S) 

and  since  ^  is  also  diagonal  we  may  restate  (16)  as 

(18)  A  =  min  tr  [Q(WR-Y)  (WR-Y) ' ] • 

Re9 

!  11 

Defining  Q2,    such  that  Q2Q2   =  Q,    we  may  write 

1  1 

(19)     A  =  min  tr    [Q2 (WR-Y)    (WR-Y)»Q2] 
Re9 

and  after  manipulation, 

1    !    1    i 

(20)  A=  min  tr  [(Q2WR-Q2Y) ' (Q2WR-Q2Y) ] . 

Re0 

~    i      „         -1- 
Now  substitute  W  =  Q2W  and  Y  =  Q2Y  into  (20)  to  obtain 

(21)  A=  min  tr  [( WR-Y) '( WR-Y) ] 

Re0 

which  may  be  written 


(22)   A  =  min  tr  (R'W'WR  -  2R»W'Y  +  Y'Y) 
Re9 
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or,  since  the  trace  of  a  sum  equals  the  sum  of  the  traces, 


(23)  a  =  min  [tr(R'W'WR)  -  2tr(R»WrY)  +  tr(Y'Y)l 
Re0  


Now  consider  the  first  and  last  terms  of  (23).   Clearly  both 
are  independent  of  R.   Furthermore,  by  normalization  and  our  definitions 
of  W,  Y,    W  and  Y  we  may  write 


{2k)      tr(R'W'WR)   =  tr(W'W)   =  tr [(Q2W) ' (Q2W) ] 


and 


-   Z  Z  q.     .  E  w.    ,    =  Z  x.   Z  w.    ,     =1 
±  j  *i,  j  k     i,k       i     1  k    i,k 


(25)     tr(Y'Y)   =  tr[(Q2Y)'(Q2Y)]   =  ZZ  q.     .  Z  y2 

"  -       "-  iji,  Jkj,k 

2 


=  Z  z     Z  y  =  1. 

j     J    k     J,k 


i.  1 

,2t.t\  I/A2-. 


Since  also  the  middle  term  of  (23)  may  be  written  -2tr[R' (Q  W) ' (Q  Y) ]  = 
-2tr(R'W'QY),  we  have  shown  that  our  rotation  problem  may  be  formulated 
equivalently  as 


(26)      A  =  2  -   2  max  tr(R'W'QY) 
Ree 


or, 


(27)     A  =  2  -   2  max  tr(R'W'QY) 
Ree 


-1> 


With  reference  to  (27)  we  notice  that  solution  of  (7)  is 
equivalent  to  solution  of 


(28)  P    =    max     tr(R'W'QY) 
f '  g   R£0,  Q^Jt  a 


where  there  exist  the  inverse  monotonic  mappings  A-   =  2  -  2P„   and 
Pp    =  1  -  2  Ap  „  ■   Since  Ap   is  formulated  as  a  mean  sum  of  squared 
distances,  its  lower  bound  is  zero,  hence  the  upper  "bound  for  Pp   is 
unity.   We  may  thus  refer  to  Pp   as  the  pattern  correlation  of  f  and  g 

9   o  -  .- 

and  solve  (28)  as  an  alternative  to  (7). 

Now  if  we  allow  R  to  be  any  orthogonal  transformation,  i.e., 
either  a  proper  or  improper  rotation,  the  optimization  problem  given  by 
any  of  the  above  formulations  of  our  pattern  transformation  problem 
may  be  recognized  as  the  Procrustes  problem  of  psychometrics  [13, 14].   The 
problem  arises  in  factor  analysis  and  multidimensional  scaling  applica- 
tions where  it  is  desirable  to  compare  two  sets  of  factor  coordinates 
independently  determined  for  the  same  set  of  variables  by  rotation  of 
one  set  to  maximum  spatial  congruence  with  the  other  to  maximize  between- 
set  factor  correlations.  Mathematically,  the  problem  is  closely  related 
to  the  canonical  correlation  problem  of  multivariate  analysis. 

It  has  been  shown  then  that  for  our  present  problem  where  W  and 
Y  and  hence  W,  Y,  W  and  Y  are  of  full-column  rank  k,  an  optimal  orthogonal 
transformation  R  is  given  by 

(29)   R  =  (HL"2H')  W'Y 

where  H  (k  x  k)  and  L  (k  x  k  diagonal)  represent  respectively  the 

eigenvectors  and  the  eigenvalues  of  the  matrix  (WYY'W)  [15]  •   Since  both 

W  and  Y  are  of  rank  k,  all  roots  of  (W'YY'W)  are  positive  and  we  may  take 

_i 
as  the  elements  of  L~2  the  reciprocals  of  the  positive  square  roots  of 

the  elements  of  L  [13]* 
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While  it  is  true  that  computation  of  R  via  (29)  optimizes  A 
over  all  orthogonal  transformations,  definition  of  element  correspondences 
Q  via  (h)    (or  via  (6)  where  R  occurs  as  a  small  proper  rotation)  makes 
it  extremely  improbable  that  the  maximum  of  tr(R'WQY)  will  now  occur 
for  an  improper  rotation—that  is,  a  reflection  of  f  about  some  axis 
as  well  as  a  proper  rotation  of  f  with  respect  to  g.   Exceptions  to  this 
rule  occur  when  comparing  patterns  whose  coordinate  matrices,  W  or  Y, 
are  only  weakly  of  full-column  rank,  i.e.,  patterns  of  nominal  full- 
column  rank  k  whose  spatial  geometries  can  be  accommodated  with  only 
slight  distortion  in  a  subspace  of  dimensionality  k'  <  k.   For  example, 
where  two  patterns  being  compared  represent  quantized  left  and  right 
parentheses,  "("  and  ")",  and  where  Q  has  been  given  by  (k),    we  would 
expect  the  R  given  by  (29)  to  contain  a  horizontal  reflection.   On  the 
other  hand,  if  the  two  patterns  being  compared  are  "M"  and  "W",  both 
patterns  strongly  two-dimensional,  we  would  not  expect  an  R  computed  via 
the  same  method  to  contain  a  vertical  reflection.   In  any  case,  where 
pattern  reflections  are  significant,  the  determinant  of  the  matrix  R 
(det  R)  may  be  computed  to  detect  improper  rotations  and  further  action 
may  be  undertaken  appropriate  to  the  specific  application. 

In  the  last  section,  we  presented  a  method  for  determining  Q, 
optimal  with  respect  to  an  assumed  S.   In  this  section,  we  have  shown 
how  an  optimal  transformation  R,  and  hence  an  optimal  S,  may  be  determined 
with  respect  to  a  given  Q,.   Since  both  subproblems  are  formulated  to 
optimize  the  same  criterion  Ap  _  (or  P«   ),  iterative  solution  of  both 
yields  a  value  of  Ap   optimal  at  least  locally  over  all  S  €  XL  _  and 
Q  p  -rt-p   .   Thus  given  a  set  of  quantized  patterns  F  for  which  rotational 
displacements  can  be  assumed  small,  a  numerically  expressible  procedure 
exists  for  determining  Ap   for  all  f  €  F  and  all  g  e  F. 

6.   The  Network  Entropy  Formulation  of  the  Correspondences  Problem 

In  Section  2.  above,  we  noted  that  the  problem  of  determining 
an  optimal  set  of  weighted  correspondences  between  the  elements  of  two 
patterns  (an  extremal  probabilistic  matching  of  pattern  quits)  can  be 
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solved via  well  known  linear  programming  techniques,  specifically  via 

Hitchcock  or  transportation  algorithms.   While  any  of  these  computational 

procedures  may  he  used  within  a  variety  of  classification  methods  "based 

on  D~   and  Ap   ,  the  computational  characteristics  of  pattern  information 
if  g      x>  S 

processing  in  general  compel  us  to  look  further  for  a  numerical  expression 
of  our  correspondences  problem  of  a  structure  more  appropriate  to  parallel 
computation. 

Here  our  motivation  stems  from  two  sources.   For  technical  and 
economic  reasons,  we  wish  to  explore  the  applicability  of  the  pattern 
recognition  methodology  presented  for  special-purpose  hardware  implemen- 
tations.  For  purely  scientific  and  philosophical  reasons,  we  wish  not 
to  overlook  any  meaningful  analogies  between  the  numerical  methodology 
itself  and  naturally- occurring  pattern  information  processes. 

Now  a  problem  closely  related  to  the  Hitchcock  problem  (and 
well  known  to  urban  and  regional  transportation  planners)  is  called  the 
entropy  network  distribution  problem  [lo,  17].   The  problem  arises  where 
it  is  desirable  to  simulate  traffic  flows  within  a  metropolitan  region 
given  data  describing  distributions  of  populations  and  economic  activities 
over  some  set  of  analysis  zones  subdividing  the  region,  zone-to-zone 
travel  times,  and  estimates  of  mean  travel  times  for  specific  types  of 
trips  within  the  region.   Borrowing  the  notation  of  our  pattern  corres- 
pondences problem,  let  A  represent  the  mean  travel  time  for  all  home-work 
commuting  trips,  let  X  be  the  probability  distribution  of  workers  over 
m  residential  zones,  let  Z  be  the  distribution  of  jobs  over  n  employment 
zones,  and  let  S  be  a  matrix  of  network  travel  times  between  any 
residential  zone  and  any  employment  zone.   The  problem  requires  determina- 
tion of  a  most  probable,  mean,  or  maximum  entropy  joint  probability 
distribution  Q  with  marginals  X  and  Z  such  that  each  element  q.  . 
represents  the  forecasted  proportion  of  all  trips  occurring  between  the 
i-th  residential  zone  and  the  j-th  employment  zone.   Mathematically,  the 
problem  is  formulated 

m  n 
(30)   max  H  =  -  Z  Z  q    log  a. 
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subject  to  (1),  (2),    and  (3)  (as  given  in  Section  2.  above)  and  the 
additional  mean-travel-time  constraint 

m  n 
(31)  Hq    s    =  A. 

Note  that  constraint  (31)  may  be  considered  simply  as  an  a  priori 
specification  of  overall  network  distribution  efficiency  or  total  energy 
expenditure. 

The  solution  to  the  problem  is  given  by 

(32)  cl   =  x  u  z  v  exp  (-  p  s   )     i  =  1,  . . .,  m 

J-t  J    -1  -1   J   J  ij  J 

j  =  1,  . . . ,  n 

where  (3  represents  the  Lagrange  multiplier  associated  with  constraint 
(31)  and  the  u-  and  v-  are  functions  of  the  Lagrange  multipliers 

J-  o 

associated  with  constraint  sets  (l)  and  (2).   It  has  been  shown  [18]  that 
corresponding  to  any  real  (3  there  exists  a  unique  Q  maximizing  (30 )  and 
satisfying  (l),  (2),  and  (3)  given  by  (32)  where  the  parameters  u^  and  v- 
may  be  determined  by  iterative  solution  of  the  equations 

n  1 

(33)  u .  =  [  E  z  v  exp  (-  p  s   )]"      i  =  1,  ...,  m 

i   J   J  ±)  j 

m  1 

(3*0  v  =  [  Zx  u  exp  (-  p  s   )]"      j  =  1,  ...,  n. 

J     i  x>  "J 

Additionally,  it  has  been  shown  that  there  exists  a  monotonic  mapping 
between  all  p  and  all  feasible  a  such  that  as  p  approaches  -°°, 
A  approaches  A   ,  and  as  p  approaches  +c°,   A  approaches  ^j^* 
where  A™   and  A  •  >   respectively,  denote  the  maximum  and  minimum 
values  of  A  feasible  for  given  S,  X,  and  Z  [19,20].   Together  these 
results  yield  a  theoretical  basis  for  iterative  determination  of  the 
unique  Q  maximizing  (30)  and  satisfying  a  particular  feasible  efficiency 
constraint  (31)  as  well  as  constraints  (l),  (2),  and  (3).   Since  \^n 
of  the  entropy  network  distribution  problem  is  analogous  to  the  pattern 
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association  criteria  of  our  present  pattern  information  processing  models, 
these  results  also  imply  that  we  may  at  least  theoretically  formulate  a 
maximum  entropy  correspondence  matrix  Q*,  unique  and  optimal  with  respect 
to  our  pattern  association  criteria  via  equations  (32),  (33)>  and  (3*0 
with  the  parameter  (3  set  to  +°°. 

Using  theorems  developed  elsewhere  [21]  and  well  known  properties 

of  the  Hitchcock  model,  Evans  [20]  demonstrates  several  features  of  the 

matrix  Q*  and  suggests  a  strategy  hy  which  it  may  he  computed.   Let  E 

denote  a  hinary  matrix  such  that  en-  4  =  1  for  all  subscript  pairs  (i,  j) 

xj  J 

where  q*  >  0  and  e,-  a   =  0  elsewhere.   (Properties  of  the  Hitchcock  model 
imply  that  E  will  be  sparse. )   The  desired  Q*  and  the  matrix  E  then 
interrelate  arithmetically  in  the  form 

(35)  q.  .  =  x.  u.  z.v.e.  .      i  =  1,  ....  m 

j  =  1,  •  ••>  n 

*      * 

where  the  vector  elements  u.  and  v.  satisfy  the  relations 

1      J 


(36)  u*  =  [  Z  z  v*  e.  S1  i  =  1, 

x         j    J    J    -■->  J 


m 


*     m     *       1 
(37)  v  =  [  E  x  u  e,  .]"      j.-=  1,  ...,  n 

j     j_  x  -1-  j-j  j 

Despite  these  simple  properties  of  extremal  solutions  to  the  entropy 
network  distribution  problem,  Evans '  method  for  exposing  E,  and  hence 
Q*,  requires  initial  solution  of  the  associated  Hitchcock  problem 
presumably  by  traditional  techniques. 

7»   A  Heuristic  Procedure  for  Determination  of 
Planar  Pattern  Dissimilarities 

Since  for  any  pairwise  pattern  comparison  computation  of  pattern 
dissimilarity  via  (7)  or  pattern  correlation  via  (28)  is  necessarily  an 
iterative  hill-climbing  procedure,  the  particular  set  of  quit  correspon- 
dences Q  determined  at  any  one  iteration  can  only  be  stepwise  optimal 
with  respect  to  the  particular  transformation  R  determined  previously 
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at  that  iteration.   Thus  given  a  convenient  procedure  for  determining 
good  unbiased  approximations  of  Q,  we  might  choose  to  hill-climb  using  at 
each  step  only  estimates  of  extremal  quit  correspondences. 

One  possible  computational  strategy  for  approximating  extremal 
quit  correspondences  proceeds  as  follows. 

Establish  the  matrix  E  such  that  e.  .  =  exp(-p  s.  .)  where  S  is 
S  scaled  linearly  to  have  elements  within  a  specified  interval  (say  between 
0  and  1)  and  p  is  chosen  as  large  as  computational  considerations  permit 
(say  p  =  150).   Initialize  V  as  (1,  ...  1  )'  and  determine  estimates  of 
the  vectors  U*  and  V*  via  iterative  solution  of 

n    «   A     1 

(38)  u  =  [  I  z  v  e    ]'       i  =  1,  ...,  m 

x      j   J   J  ±}  J 

m    A  \  1 

(39)  v  =  [  Ex  u  e   ]~      J  =  1,  ...,  n 

and  then  estimate  Q*  via 


(40)  q.  .  =  x.  u.  z .  v .  e.  . 

1*  J    1  1  0  j  1,  j      1  =  1,  ...,  m 

j  =  1,  . . .,  n. 

Now  assuming  that  p  is  sufficiently  large  such  that  Q  is  close 
to  Q*f    then  we  may  expect  small  elements  of  Q  to  correspond  to  zero 
elements  of  Q*.   Hence  we  may  select  for  each  q.  .  some  threshhold  value, 
say  q"-  .  =  x.  *  z.,    and  approximate  Evans'  binary  matrix  E  of  Section  6. 
above  by  re-defining  e .  .  =  0  wherever  q.  .  <  q.  .  and  re-defining 
e.  .  =  1  wherever  q.  .  >  q,-  •.   Then,  with  this  new  definition  of  E 
(hopefully  Evans'  E  above),  return  to  iteration  of  equations  (38)  and 
(39)  obtaining  new  estimates  of  U*  and  V*  and  compute  a  final  approximation 
of  Q*  via  (kO). 

Now  the  Procrustes  formulation  of  our  pattern  transformation 
problem  provided  a  general  solution  applicable  for  comparison  of  patterns 
of  any  dimensionality.   In  the  case  of  planar  pictorial  patterns,  however, 
the  problem  may  be  resolved  in  a  more  direct  fashion. 
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Restricting  R  to  be  a  proper  rotation,  let  C  =  (WQY)  for  a 
given  Q  and  write  max  tr(R'W'QY)  in  (28)  as 


(kl)    f  (a)  =  max  tr 
a 


cos  a  -  sin  a 
sin  a   cos  a 


c      c 
1,1    1,2 


c      c 
L  2,1    2,2  J 


or  equivalently, 


Then 


(^2)  f  (a)  =  max  (c^  2  -  c2  x)  sin  a  +   (cx  x  +  c2  2)   cos  a 
let  A  =  (c1  2  -  c2  x)  and  B  =  (^  x  +  c2  2)  and  write  (k2)   as 


(^3)   f (a)  =  max  (a  sin  a  +   b  cos  a). 

a 


2    2  2 
Also,  let  K  =  (A  +  B  )  so  that  A  =  K  sin  <f>  and  B  =  K  cos  <J>  and 


(kh)    f(a)  =  max  (K  sin  $  sin  a  +  K  cos  <{>  cosa)  =  max  cos  (a:  -  <f>)  . 


a 


a 


2  p  — 

The  maximum  occurs  (at  a  =  <j>  )  as  K  =  [  (c-,  0  -   c0   ,  )  +  (c,  -.  +  cQ  0)  ]2. 

The  proper  rotation  maximizing  (28)  is  then  determined  by  the  relations 

sin  a   =  A/K  =  (c-l  2  "  c2  1^/K  and  cos  a  =  B/K  =  (cl  1  +  c2  2^/K*   There- 
fore, in  comparing  any  two  planar  patterns  f  and  g,  a  proper  rotation  R, 
stepwise  optimal  with  respect  to  a  given  correspondence  matrix  Q,  can  be 
determined  directly  as  a  function  of  the  four  elements  of  the  matrix 
C  =  (W'QY). 

The  two  procedures  above  for  convenient  approximation  of  quit 
correspondences  and  direct  determination  of  stepwise  optimal  rotations, 
in  combination,  yield  a  simple  heuristic  approach  to  measurement  of 
pattern  dissimilarities  for  planar  patterns.   Such  a  procedure  may  be 
programmed  as  follows: 
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1.  Input,  normalize  (see  Section  2.),  and  store  patterns 
f  and  g  in  terms  of  quantizations  (wlx)^  and  (Ylz)  . 

2.  Set  a  =   1,  T  =  (0,0)',  R  =  I,  and  A°   =  M  (M  some 
large  value ) .  ' 

3-   Compute  S  via  (6). 

h.      Approximate  Q  using  the  heuristic  procedure  given  above 

in  this  section,  and  obtain  a  new  estimate  of  Ar>   via  (5)- 

5-   If  |A°   -  A   I  <  0.01,  stop.   Otherwise,  let  A°   =  A   . 
t>&         *>§  f,  g    f,  g 

6.   Compute  a  new  R  via  the  short  method  given  in  this  section 
and  return  to  Step  3- 

8.   Computational  Results 

To  evaluate  the  effectiveness  of  such  a  pairwise  pattern 
comparison  procedure,  the  following  experiment  was  conducted.   A  test 
set  of  ten  prototype  patterns  corresponding  to  the  numerals  1  through  9 
and  0  was  designed.   For  convenience,  the  elements  of  each  prototype 
were  chosen  spatially  coincident  with  the  cells  of  a  k   x  8  integer  grid 
and  all  elements  of  all  prototypes  were  assigned  equal  quit  densities. 
Then,  sixteen  noisy  copies  of  each  prototype  were  generated  using  the 
equations 

(45)  W  =  aQ    (Y  Re)  +  JT^ 

(k6)     X  =  Z  +  xe 

where  J  denotes  the  vector  (1,  ...,  1  ) ',  a   ,  T  and  R  are,  respectively 
randomly  selected  scale,  translational,  and  rotational  transformations  of 
prototype  spatial  coordinates  Y,  and  x.  represents  a  random  perturbation 
of  prototype  quit  distributions  Z.   The  ten  prototype  patterns  selected  and 
the  sixteen  noisy  versions  generated  for  each  are  reproduced  in  Figure  2. 
where  the  sizes  of  individual  pattern  elements  have  been  plotted  proportional 
to  quit  densities. 
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Figure  3*   A  computer  graphic  showing  the  rank  order  of  prototype-to- 
pattern  dissimilarity  measures  computed  for  the  sixteen  noisy  "6's"  of 
Figure  2.   Individual  blocks  have  been  plotted  proportional  to  l/A, 
Also,  prototypes  have  been  ordered  from  left  to  right  in  accordance' 
with  mean  prototype  similarity  with  all  noisy  patterns  depicted , 
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A  Fortran  implementation  of  the  above  outlined  algorithm  was 
executed  on  the  IBM  360/75  of  the  University  of  Illinois  to  compute  the 
1600  pattern-to-prototype  dissimilarity  measures.   In  every  case  the 
minimum  pattern-to-prototype  dissimilarity  measure  occurred  when  a 
pattern  was  matched  with  the  correct  prototype.   The  total  IBM  3^0/75 
CHJ  time  required  for  computation  of  all  1600  dissimilarity  measures 
was  10^0  seconds,  or  approximately  .65  seconds  per  comparison.   These 
computation  times  may  be  reducible  by  more  efficient  programming. 

As  typical  of  the  results  obtained,  a  graphical  presentation 
of  all  comparisons  for  the  noisy  "6's"  is  given  in  Figure  3-   There, 
individual  block  heights  have  been  plotted  proportional  to  1//V,   and 
scaled  vertically  with  respect  to  the  maximum  value  of  l/A^   occurring 
within  the  l6o  comparisons.   Thus,  while  exaggerating  proportional 
differences,  the  display  makes  plainly  visible  the  rank  order  of  all 
similarities  computed  by  the  pairwise  comparison  procedure. 

9.   Conclusions 

Adopting  the  premise  that  patterns  are  their  own  most  valid 
characterizations  and  relying  greatly  on  mathematical  concepts  long 
employed  within  urban  systems  modeling,  we  have  posited  a  new  methodology 
for  direct  quantification  of  pattern  associations  that  should  serve  well 
as  an  alternative  to  conventional  template-matching  methods. 

The  mathematical  bases  of  the  methods  proposed  are  quite 
general.   Wherever  it  is  reasonable  to  represent  patterns  as  spatial 
probability  distributions  of  information,  the  numerical  procedures 
presented  can  be  employed  to  obtain  specific  measures  of  association 
between  patterns.   The  methodology  is  general  with  respect  to  the  spatial 
dimensionality  of  patterns  processed.  Unlike  traditional  correlation 
methods,  moreover,  it  does  not  depend  on  any  fixed  format  or  order  for 
pattern  information  sampling  and  quantization  and,  in  fact,  seems 
relatively  insensitive  to  such  considerations. 
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It  is  often  argued  that  we  stand  to  gain  from  research  of 
abstract  models  of  pattern  information  processing,  not  only  more  general 
methodological  bases  for  technological  advancement,  but  also  additional 
insights  into  the  possible  nature  of  our  own  mechanisms  of  perception 
and  information  processing.   Thus,  the  feature  extraction  procedures  of 
our  analytic  models  of  pattern  recognition  have  their  counterparts 
within  scientific  theories  of  animal  vision.   In  this  context,  we  are 
hopeful  that  the  abstract  models  of  pattern  information  processing 
posited  above  may  lend  additional  support  to  existing  mathematical  and 
logical  bases  for  holistic  mechanisms  within  perception  such  as  those 
cooperative  processes  within  human  vision  hypothesized  and  extensively 
investigated  by  Julesz  [22].   To  the  extent  that  such  reinforcement  may 
be  derivable  from  the  above  abstractions,  we  consider  it  not  without 
significance  that  our  models  of  pattern  communications  have  strong 
relationship  to,  and  indeed  in  this  case  stem  directly  from,  our  models 
of  community. 
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